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Abstract 

I describe a Lanczos method to compute the Neuberger Operator and a multigrid 
algorithm for its inversion. 

1 Introduction 

Quantum Chromodynamics (QCD) is a theory of strong interactions, where the chiral 
symmetry plays a mayor role. There are different starting points to formulate a lat- 
tice theory with exact chiral symmetry, but all of them must obey the Ginsparg- Wilson 
condition [|l|: 

'j^D-^ + = a'j5a-\ (1) 

where a is the lattice spacing, D is the lattice Dirac operator and is a local operator 
and trivial in the Dirac space. 



*Talk given at the "Interdisciplinary Workshop on Numerical Challenges to Lattice QCD" , Wuppertal, 
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A candidate is the overlap operator of Neuberger 0: 

D = l- A{A^Ay^/\ A = M- aDw (2) 

where M is a shift parameter in the range (0, 2), which I have fixed at one and is the 
Wilson-Dirac operator, 

Dw = lT.b,{d; + d,)-ad;d,] (3) 

and dfj, and 9* are the nearest-neighbor forward and backward difference operators, which 
are covariant, i.e. the shift operators pick up a unitary 3 by 3 matrix with determinant 
one. These small matrices are associated with the links of the lattice and are oriented 
positively. A set of such matrices forms a "configuration". = 1,...,5 are 4 by 

4 matrices related to the spin of the particle. Therefore, if there are N lattice points, 
the matrix is of order 12A^. A restive symmetry of the matrix A that comes from the 
continuum is the so called 75 — symmetry which is the Hermiticity of the 75^4 operator. 

The computation of the inverse square root of a matrix is reviewed in [Q. In the 
context of lattice QCD there are several sparse matrix methods, which are developed 
recently |^, ^, ^ |^. I will focus here on a Lanczos method similar to Q. For a more 
general case of functions of matrices I refer to the talk of H. van der Vorst, and for a 
Chebyshev method I refer to the talk of K. Jansen, both included in these proceedings. 

2 The Lanczos Algorithm 

The Lanczos iteration is known to approximate the spectrum of the underlying matrix in 
an optimal way and, in particular, it can be used to solve linear systems [Q. 
Let Qn = ill, ■ ■ ■ , Qn] be the set of orthonormal vectors, such that 

ylUQ, = Q„T„ + /5„g„+i(ei")f, qi = pib, Pi = 1/||6||2 (4) 

where T„ is a tridiagonal and symmetric matrix, b is an arbitrary vector, and jSn a real 
and positive constant, e^-* denotes the unit vector with n elements in the direction m. 
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By writing down the above decomposition in terms of the vectors qi,i = 1, . . . , n and 
the matrix elements of T„, I arrive at a three term recurrence that allows to compute these 
vectors in increasing order, starting from the vector qi. This is the Lanczos Algorithm: 

Po = 0, pi = l/\\b\\2, go = 0, qi = Pib 
for i = 1, . . . 
V = A^Aqi 



OLi = qjv 



(5) 



V ■=v - qiUi - gj-iA-i 

A = \\v\\2 

if Pi < tol, n = i, end for 
qi+i = v/pi 

where tol is a tolerance which serves as a stopping condition. 

The Lanczos Algorithm constructs a basis for the Krylov subspace 0: 

span{6, A^Ab, (AU)"-^^} (6) 

If the Algorithm stops after n steps, one says that the associated Krylov subspace is 
invariant. 

In the floating point arithmetic, there is a danger that once the Lanczos Algorithm 
(polynomial) has approximated well some part of the spectrum, the iteration reproduces 
vectors which are rich in that direction |^. As a consequence, the orthogonality of the 
Lanczos vectors is spoiled with an immediate impact on the history of the iteration: if 
the algorithm would stop after n steps in exact arithmetic, in the presence of round off 
errors the loss of orthogonality would keep the algorithm going on. 

3 The Lanczos Algorithm for solving Ax = h 

Here I will use this algorithm to solve linear systems, where the loss of orthogonality will 
not play a role in the sense that I will use a different stopping condition. 
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I ask the solution in the form 

X = QnVn (7) 

By projecting the original system on to the Krylov subspace I get: 

QlA^Ax = Qib (8) 

By construction, I have 

b = Q„eS"Vpi, (9) 
Substituting x = QnUn and using (|), my task is now to solve the system 

TnVn = eS"Vpi (10) 

Therefore the solution is given by 

x = Q„T-M"Vpi (11) 

This way using the Lanczos iteration one reduces the size of the matrix to be inverted. 
Moreover, since T„ is tridiagonal, one can compute ?/„ by short recurences. 
If I define: 

ri = b- A^Axi, Qi = Pin, fji = piXi (12) 
where i = 1, . . ., it is easy to show that 

Pi+iPi + piUi + Pi_i A-i = 

(13) 

qi + yi+i(3i + Viai + = 

Therefore the solution can be updated recursively and I have the following Algorithml 
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for solving the system A^Ax = b: 



= 0, pi = I/II&II2, go = 0, qi = pib 
for i = 1, . . . 

V = A^Aqi 
ai = q\v 

v:=v - qitti - qi-iPi-i 

Pi = ll^lb 
gi+i = v/pi 



(14) 



ft 

^ i+1 •= Qi+l/ Pi+l 
^i+l '■= Vi+l/ Pi+l 

if , ^ , < toL n = i, end for 
•' \pi+i\ ' ' •' 

4 The Lanczos Algorithm for solving {A^AY^^x = b 

Now I would like to compute x = (A'^ A)~^^'^b and still use the Lanczos Algorithm. In 
order to do so I make the following observations: 

Let (A'^A)'^^'^ be expressed by a matrix- valued function, for example the integral 
formula 

(^t^)-l/2 _ ± / ^^^^2 ^ ^t^)-l (15) 
71 Jo 



From the previous section, I use the Lanczos Algorithm to compute 

(A^Ay'b = QnT-'e^r^/p, (16) 

It is easy to show that the Lanczos Algorithm is shift-invariant, i.e. if the matrix 
A'^A is shifted by a constant say, t^, the Lanczos vectors remain invariant. Moreover, the 
corresponding Lanczos matrix is shifted by the same amount. 

This property allows one to solve the system (t^ + A'^A)x = b hj using the same 
Lanczos iteration as before. Since the matrix (t^ + A'^A) is better conditioned than A^A, 



it can be concluded that once the original system is solved, the shifted one is solved too. 
Therefore I have: 

it' + A^A)-'b = Qn{f + T„)-'eS"Vpi (17) 
Using the above integral formula and puting everything together, I get: 

a; = (AU)-i/26 = Q„T„-i/M"Vpi (18) 
There are some remarks to be made here: 

a) As before, by applying the Lanczos iteration on AM, the problem of computing 
{A^A)~^^'^b reduces to the problem of computing yn = T'^/^Ci^Vpi which is typically a 
much smaller problem than the original one. But since T^/^ is full, ?/„ cannot be computed 
by short recurences. It can be computed for example by using the full decomposition of 
Tn in its eigenvalues and eigenvectors; in fact this is the method I have employed too, for 
its compactness and the small overhead for moderate n. 

b) The method is not optimal, as it would have been, if one would have applied it 
directly for the matrix (AM)^/^. By using A'^A the condition is squared, and one looses 
a factor of two compared to the theoretical case! 

c) From the derivation above, it can be concluded that the system {A'^AY^'^x = 6 is 
solved at the same time as the system A'^Ax = b. 

d) To implement the result (|1^) , I first construct the Lanczos matrix and then compute 

2/n = T„-^/M"Vpi (19) 

To compute x = QnUn, I repeat the Lanczos iteration. I save the scalar products, though 
it is not necessary. 
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Therefore I have the following Algorithm2 for solving the system {A^Ay^^x — b: 

Po = 0, pi = I/II&II2, qo = 0, ?i = Pib 
/or ^ = 1, . . . 

V = A'^Aqi 

Pi = ll^lb 

Qi+l = v/Pi 

^ _ Pi'>i+Pi-ift-i 

A+1 - Wi — 



^■fj^l < "^^h end for 

(20) 

Set {Tn)i,i = ai, {Tn)i+i,i = {Tn)i,i+i = A, otherwise {Tn)i,j = 

?0 = 0, ?1 = 2^0 = 

/or i = 1, . . . ,n 

Xi ^Xi-i + qiul^^ 

V = A^Aqi 

V :=v - qiai - qi-iPi-i 
qi+i = v/pi 

where by o I denote a vector with zero entries and C/„, A„ the matrices of the egienvectors 
and eigenvalues of T^. Note that there are only four large vectors necessary to store: 

qi-i,qi,v,Xi. 

5 Testing the method 

I propose a simple test: I solve the system A^Ax = 6 by applying twice the Algorithm2, 
i.e. I solve the linear systems 

{A^Ay/^z = b, {A^Ay/^x^z (21) 



in the above order. For each approximation Xj, I compute the residual vector 

ri = b- A^Axi (22) 

The method is tested for a SU(3) configuration at /5 = 6.0 on a 8^16 lattice, corre- 
sponding to an order 98304 complex matrix A. 

In Fig.l I show the norm of the residual vector decreasing monotonically. The stag- 
nation of ||rj||2 for small values of tol may come from the accumulation of round off error 
in the 64-bit precision arithmetic used here. 

This example shows that the tolerance line is above the residual norm line, which 
confirms the expectation that tol is a good stopping condition of the Algorithm2. 



6 Inversion 

Having computed the operator, one can invert it by applying iterative methods based 
on the the Lanczos algorithm. Since the operator D is normal, it turns out that the 



Conjugate Residual (CR) algorithm is the optimal one JTO . 

In Fig. 2 I show the converegence history of CR on 30 small 4'^ lattices at /3 = 6. The 
large number of multiplications with D^/ suggests that the inversion of the Neuberger 
operator is a difficult task and may bring the complexity of quenched simulations in lattice 
QCD to the same order of magnitude to dynamical simulations with Wilson fermions. 

Therefore, other ideas are needed. 

The essential point is the large number of small eigenvalues of A that make the com- 
putation of D time consuming. Therefore, one may try to project out these modes and 
invert them directly 



Also, one may try 5— dimensional implementations of the Neuberger operator, such 



that its condition improves 12 



I have tried also to reformulate the theory in 5 dimensions by using the corresponding 
approximate inversion as a coarse grid solution in a multigrid scheme |jT3l . 
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The scheme is tested and the results are shown in Fig. 2, where the multigrid pattern 
of the residual norm is clear. The gain with respect to CR is about a factor 10. 

Note that to invert the "big" matrix I have used the BiCGstab2 algorithm which 



is almost optimal in most of the cases for the non-normal matrices as its is the matrix Ai 
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Test of the Lanczos Algorithm for the Inverse Square Root: (A^A) b 

10° I 1 , 1 , , 1 
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Figure 1: The dots show the norm of the residual vector, whereas the hue shows the 
tolerance level set by tol in the Algorithm2. 
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Figure 2: Norm of the residual error vs. the number of Dw multipHcations on 30 con- 
figurations. Circles stand for the straightforward inversion with CR and stars for the 
multigrid algorithm. 
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